-
Notifications
You must be signed in to change notification settings - Fork 15
Added support for compression on meta device #376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
src/compressed_tensors/compressors/model_compressors/model_compressor.py
Outdated
Show resolved
Hide resolved
src/compressed_tensors/compressors/model_compressors/model_compressor.py
Outdated
Show resolved
Hide resolved
src/compressed_tensors/compressors/quantized_compressors/base.py
Outdated
Show resolved
Hide resolved
src/compressed_tensors/compressors/quantized_compressors/pack_quantized.py
Show resolved
Hide resolved
src/compressed_tensors/compressors/quantized_compressors/pack_quantized.py
Show resolved
Hide resolved
src/compressed_tensors/compressors/sparse_compressors/sparse_24_bitmask.py
Outdated
Show resolved
Hide resolved
src/compressed_tensors/compressors/sparse_compressors/sparse_24_bitmask.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure the test cases in folders ending in _skipped
now pass: https://github.com/vllm-project/llm-compressor/tree/main/tests/llmcompressor/transformers/compression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! One comment and another suggestion applied to 3 different lines
src/compressed_tensors/compressors/sparse_compressors/sparse_24_bitmask.py
Outdated
Show resolved
Hide resolved
src/compressed_tensors/compressors/sparse_compressors/sparse_24_bitmask.py
Outdated
Show resolved
Hide resolved
src/compressed_tensors/compressors/sparse_compressors/sparse_24_bitmask.py
Outdated
Show resolved
Hide resolved
src/compressed_tensors/compressors/model_compressors/model_compressor.py
Outdated
Show resolved
Hide resolved
src/compressed_tensors/compressors/model_compressors/model_compressor.py
Outdated
Show resolved
Hide resolved
…4_bitmask.py Co-authored-by: Brian Dellabetta <brian-dellabetta@users.noreply.github.com>
…4_bitmask.py Co-authored-by: Brian Dellabetta <brian-dellabetta@users.noreply.github.com>
…4_bitmask.py Co-authored-by: Brian Dellabetta <brian-dellabetta@users.noreply.github.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
They're passing now. Logs are pasted in the PR description. 🫡 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand the changes to pack_to_int32
. The original function doesn't actually use any explicit numpy calls (outside of start and end), so I don't see why the original function wouldn't work with meta tensors?
Adding some tests for compressing meta models as well as using the compressors with is_meta=True would help with this
Also, are the changes from here required?
src/compressed_tensors/compressors/quantized_compressors/pack_quantized.py
Show resolved
Hide resolved
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spoke more with @shanjiaz and clarified some things. Tests are correct and passing, and the logic looks correct, nice job!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice job
Summary:
This PR added model compression/decompression support to handle models instantiated on the meta device. Updated downstream dependencies as well.Specifically,
Sparse24BitMaskCompressor:
Quantized Compressors:
Test:
Tested with
pytest tests/test_compressors
in compressed_tensor andpytest tests/quantization/compressed_tensors_integration/
in transformer and all tests passed.skipped
tests in llm-compressor now pass. Will recover them after the transformer PR merges.Nightly & e2e all pass as well : )